OPTIMAL CONTROL WITH MODERATION INCENTIVES 

DEBRA LEWIS* 

Abstract. A purely state-dependent cost function can be modified by introducing a control-dependent term rewarding 
submaximal control utilization. A moderation incentive is identically zero on the boundary of the admissible control 
region and non-negative on the interior; it is bounded above by the infimum of the state-dependent cost function, so 
that the instantaneous total cost is always non-negative. The conservation law determined by the Maximum Principle, 
in combination with the condition that the moderation incentive equal zero on the boundary of the admissible control 
region, plays a crucial role in the analysis; in some situations, the initial and final values of the auxiliary variable are 
uniquely determined by the condition that the conserved quantity equal zero along a solution of the arbitrary duration 
synthesis problem. Use of an alternate system of evolution equations, parametrized by the auxiliary variable, for one- 
degree of freedom controlled acceleration systems, can significantly simplify numerical searches for solutions of the arbitrary 
duration synthesis problem. A one-parameter family of 'elliptical' moderation incentives is introduced; the behavior of the 
well-known quadratic control cost and its incentive analog is compared to that of the elliptical incentives in two simple 
controlled acceleration examples. The elliptical incentives yield smooth solutions with controls remaining in the interior of 
the admissible region, while the quadratic incentive allows piecewise smooth solutions with controls moving on and off the 
boundary of the admissible region; in these examples, the arbitrary duration synthesis problem for the traditional quadratic 
control cost has no solution — the total cost is a monotonically decreasing function of the duration. 

1. Introduction. Optimal control problems typically involve constraints on both the state variables 
and control. For example, (bio)mechanical systems cannot generate or withstand arbitrarily large forces 
or accelerations. For some cost functions, trajectories approaching the boundary of the admissible region 
are so extravagant that the boundary can safely be left out of the mathematical model. However, when a 
task is to be executed as quickly as possible, the bounds on the possible play a crucial role in determining 
the optimal process — cost considerations would drive the solution outside the admissible region if these 
bounds were not explicit imposed. 

If the relevant constraints are explicitly incorporated in the state space and admissible control region, 
the cost function for a time minimization problem is constant. Given the degeneracy of the constant cost 
function, the optimal control values are sought on the boundary of the admissible controls set. In some 
situations of interest, geometric optimization and integration methods can be used (see, e.g., jlOj) to 
work directly on the boundary. If geometric methods are not available or desirable, penalty functions 
can be used to construct algorithms on an ambient vector space that respect the boundary due to the 
prohibitive (possibly infinite) expense of crossing it (see, e.g., [HH], and references therein). In many 
situations, particularly in biological models, a close approach to the boundary of the admissible region 
is undesirable — stresses on joints, muscles, and bones are severe near the breaking or tearing points of 
these structures — but sometimes justified. An animal may be willing to push itself to its physical limits 
to escape a high-risk situation; many machines are engineered to execute certain tasks rapidly, even if 
this involves high energy consumption and rapid wear of mechanical parts. In such cases, selection of an 
appropriate penalty function is essential, as too severe a penalty will yield overly conservative solutions. 

An important class of optimal control problems can be interpreted as modified time minimization 
problems, in which certain states are more costly than others. When modeling a conscious agent, the unit 
cost function of a traditional time minimization problem can be interpreted as representing a uniform 
stress and/or risk throughout the task, while a modified time minimization cost function models a com- 
bination of instantaneous stresses and risks that explicitly depend on the current state. This formulation 
may be more natural than an admissible/inadmissible dichotomy, particularly for biological systems. For 
example, consider the Kane-Scher model of the classic 'falling cat' problem, in which a cat is suspended 
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upside down and then released; a typical cat can right itself without net angular momentum from heights 
of approximately one meter. (See, e.g., [TH O [TH [10].) Kane and Scher [S] proposed a two rigid body 
model of a cat; to eliminate the mechanically efficient but biologically unacceptable solution in which 
the front and back halves counter-rotate, resulting in a 360° twist in the 'cat', Kane and Scher imposed 
a no-twist condition in their model. However, actual cats can and do significantly twist their bodies; 
replacing the no-twist condition with a deformation-dependent term in the cost function that discourages 
excessively large relative motions allows more realistic motions. We will refer to optimal control problems 
with cost functions depending only on the state variables as modified time minimization problems. 

Given a modified time minimization problem, we are interested in modifying the cost function to take 
the control effort into account. We assume here that a cost function modeling the do-or-die, 'whatever 
it takes' approach is known, and construct a new cost function by subtracting a control-dependent term. 
Our approach is to regard this term not as a penalty or cost, but a deduction rewarding submaximal 
control efforts. Hence we specify that this term equal zero on the boundary of the admissible control set 
and be bounded above by the minimum of the original cost function, so that the total instantaneous cost 
function remains non-negative. We can construct parametrized families of such functions and adjust the 
urgency of the task by adjusting the parameter. The incentive function may allow controls to move on 
and off the boundary of the admissible control region, or may approach zero sufficiently rapidly as the 
control approaches the boundary that controls starting in the interior of the admissible control region 
will remain there throughout the maneuver. 

The notion of a moderation incentive can guide the modification of familiar cost functions. As we shall 
see in the examples, simply modifying the cost function by a constant can make the difference between the 
existence and absence of solutions of the arbitrary duration synthesis problem. The cost functions we use 
here to illustrate this property are quadratic in the control. Quadratic control cost (QCC) minimization, 
with cost functions of the form C{x,u) = Qx{u) for some smooth family of quadratic forms Qx determined 
by an inner product or Riemannian metric, has played an important role in geometric optimal control 
theory. (See, e.g., [T], [3], [ISl: and references therein.) If the admissible control region is unbounded, 
QCC functions yield relatively simple evolution equations: if the state space is a subset of a Riemannian 
manifold M, the space of admissible controls has full rank, x — u, and C{x,u) = ^\u\1, then the traces 
in M of the optimal trajectories are geodesies. If the control u is constrained to lie in a distribution of 
less than full rank, the corresponding QQC problem leads to sub-Riemannian geometry. (See, e.g. [H] 
or ^.) Thus many existing results from geometric mechanics and (sub-)Riemannian geometry can be 
utilized in the analysis of simple QCC control problems. QCC optimization sometimes follows a 'the 
slower, the better' strategy: in some important QCC problems the total cost is a decreasing function of 
the maneuver duration; hence there is no optimal solution of the arbitrary duration QCC problem. If 
there is a range of durations [Tmm, oo) for which unique specified duration QQC solutions exist, then the 
QCC trajectory of duration Tmm may be of interest as the fastest of the slow. However, it is unclear in 
what sense these trajectories are optimal. Modifying the quadratic control cost by a constant, so as to 
satisfy the condition that the moderation incentive equal zero on the boundary of the admissible control 
region, can yield arbitrary duration synthesis problems for which unique solutions do exist in situations 
where the QCC function (nonzero on the boundary of the sphere) lacks such solutions. 

We introduce a one-parameter family of 'elliptical' moderation incentives : [0,1] — >■ [0,/^], fi G 
(0,1], by 

(1.1) C^{s) := /iVl-s^. 

(The graph of /it Vl — s^, < s < 1, is a segment of an ellipse of eccentricity /i.) Cq is the trivial 
incentive associated to the unmoderated modified time minimization problem. For fi € (0, 1], the control 
values determined by always lie in the interior of the unit ball; if the unmoderated cost function is 
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smooth, the state variables and control will also be smooth. (In contrast, some of the solutions for the 
quadratic incentive and QCC we find in the examples are only piecewise smooth, moving on and off the 
unit sphere.) However, the penalty imposed by is not prohibitive; as we shall see in the examples, we 
can come arbitrarily close to the control region boundary by adjusting /i. The elliptical incentives have 
some simple properties that make them particularly convenient to work with in certain kinds of analytic 
and numerical calculations. Finally, in the examples treated here, there are some qualitative resemblances 
between the trajectories determined by the quadratic incentive Cq and the elliptical incentives for values 
of /i near 1. 

We make several simplifying assumptions in the present work. We assume that the admissible control 
region is the Euclidean unit ball and our incentives are nonincreasing functions of the magnitude of the 
control. We restrict our attention to problems in which the state variables consist of a position vector in 
M" and its first k—1 derivatives; the fc-th derivative is fully controlled and the unmoderated cost function 
depends only on the position. These assumptions are not central to the formulation of the moderated 
problem, but they lead to particularly simple expressions in some key constructions. More general control 
systems, with more complex admissible control regions (including ones determined in part by the state 
variables) and more general controls, will be considered in future work. 

We consider two examples that illustrate some of the key features of the moderated control problems 
and suggest directions of future research. The first example is a very simple one-dimensional controlled 
acceleration problem: a particle at rest at one position is to be moved a unit distance by controlling the 
acceleration; the initial and terminal velocities are zero. This classic starter problem is treated in [17|,[6]. 
and other texts. The well-known time minimizing solution is the 'bang-bang' solution, with acceleration 
equal to 1 for the first half of the maneuver and —1 for the second half; the solution of the arbitrary 
duration problem with quadratic incentive has linear acceleration; the solutions for the elliptical incentives 
have smooth accelerations approaching the bang-bang solution as the moderation parameter approaches 
zero, and approaching the QCC solution as the parameter approaches one. 

The second example is a generalization of the first, with a position penalty added to the control 
cost. The position penalty is monotonically decreasing and equals zero at the destination. This example 
can be interpreted as a very simple model of 'spooking' (flight reaction), in which the position penalty 
models aversion to a localized stimulus and the destination is the position at which the animal first 
feels entirely safe or comfortable. The reflectional symmetry seen in the first example is broken: all of 
the cost functions studied here, with the exception of the trivial moderation incentive, yield asymmetric 
solutions, with relatively strong initial accelerations and relatively weak decelerations. The quadratic 
incentive yields only piecewise smooth solutions, while the elliptical incentive solutions are smooth for all 
nonzero values of the moderation parameter. For small-to-middling values of the moderation parameter /i, 
the solutions for the elliptical incentives show little response to the intensity of the position penalty — the 
solutions remain close to the corresponding solutions for the corresponding problem without a position 
penalty even when the position penalty is high. Roughly speaking, if little or no incentive to take it easy 
is added to a time-pressured task, the optimal strategy is to get it all over with (almost) as quickly as 
possible; there's little room for modification of the strategy if additional discomfort or risk is introduced. 
On the other hand, if there's a significant reward for moderate effort, the strategy in the absence of a 
position penalty will be to take it slowly, and the introduction of some variable risk or discomfort can yield 
dramatic speed-ups in overall execution times, as well as significant variations in control magnitudes. The 
classic quadratic control cost (QCC) function, which is nonzero on the boundary of the admissible control 
region, determines a total cost that is a monotonically decreasing function of the maneuver duration; as 
the specified duration is increased, the solutions perform an increasing number of oscillations about the 
destination before coming to rest. 
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2. The fc-th order moderated synthesis problem. We apply Pontryagin's Maximum Principle 
to a special class of optimal control problems that illustrate some of the features of moderated control 
problems, but are relatively easily analysed. We focus on the fc-th order evolution equation a;^'^-' = u, 
X : [0, tf\ — )■ M", with specified initial and final values of the x'^^\ j = 1, . . . , fc — 1, and unmoderated 
cost function depending only on the position not the derivatives of x. We further simplify the analysis 
by assuming that the admissible control region is the unit ball in M" and that the cost depends on the 
control only through its norm. 

Consider a control problem with state variable z e M™, control u E U C M*^, evolution equation 
z = V{z,u), boundary conditions z(0) — zq and z{tf) — Zf, and cost function C : x It — ;> M. 
Assume that both V and C are continuous, with continuous derivatives with respect to the state variable 
z; let A denote the set of triplets {z,u,tf), such that tf > 0, (z,u) : [0,tf] — > M™ x U satisfies the 
evolution equation and boundary equations, z is continuous with piecewise continuous derivative i, and 
u is piecewise continuous. Pontryagin's Maximum Principle states that if (z, u,t) € A minimizes the total 
cost over A, i.e. 

C{z{t),u{t))dt ^ min / C(z{t),u{t))dt, 
(z,ufy)eAJo 

then there is a continuous curve ip : [0, tf] — > M™ and constant (p > such that 



(2.1) z ^ -q:^{z,'>P,u), tp ^ --ij^{z,ip,u), and H^{z,ip,u) ^ maiiH^{z,^p,v) 



iov H^{z,^p,u) := {ip,V{z^u)) — <f>C{z^u). The Hamiltonian iJ^ is constant along a curve satisfying (2.1 ); 
if the curve minimizes the curve on A, then constant is zero. (See |17j for the precise statement and proof 
of the Maximum Principle.) 

Pontryagin's conditions are necessary, but not sufficient, for optimality; their appeal lies in their 
constructive nature: known results and techniques for boundary value problems and Hamiltonian systems 
can be used in constructing the triplets (z,M,i/) satisfying Pontryagin's conditions. This construction is 
referred to as the synthesis problem in jl7] . We will restrict our attention to the synthesis problem, setting 
aside the rigorous analysis of actual optimality of the trajectories we obtain. It suffices to consider the 
cases (j) = 1 and = 0, since we can rescale -0 by (/> ^ 0. HqIs clearly independent of the cost function C; 
Hamilton's equations for Hq equal those for a constant cost function, used in determined the minimum- 
time admissible curves. Hence we will focus on iJi, simply noting that when searching for the optimal 
solution, it is necessary to consider the possibility that the total cost is minimized by the minimum 
duration trajectory. 

Here we consider control problems of the form x'^*') — u, where x : [0, tf] — >■ M" and u : [0, tf] ^• 
■B" ~ {u e M" : < 1}, for some (possibly specified) tf > 0, with specified boundary values a;^^^(0) and 
x'^^\tf), j = 0, . . . , /c — 1. We restrict our attention to cost functions that are the difference of a term 
depending only on the position x and a term depending only on the magnitude of the control. We assume 
that the position-dependent term is bounded below (for simplicity, we take the bound to be 1), and the 
control-dependent term has range contained within [0,1]; thus the instantaneous cost is non-negative. 
Given C € e^(M", [l,oo)) and C S C°([0, 1], [0, 1]), we seek x with piecewise continuous fc-th derivative 
and piecewise continuous u minimizing the total cost 

*/ 

dt 



{c{x{t))-c{\um 



over all such state/control variable pairs. 
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To apply the Pontryagin Maximum Principle to our control problem, we first convert the k-th order 
evolution equation x'''"' — u into a first order system of ODEs by introducing the auxilliary state variables 
dj := x^^\ j = 1, . . . , A: — 1, and setting z — {x,di, . . . ,dk-i) G (M")*^ « M"'^. The resulting first order 
evolution equation is 

(2.2) i = V{z,u) := (di, . . . , dfc_i, u). 

If we let ?/' = . . . , Kk-i,\) e (M")*^ « E"'', then Hi is equivalent to the Hamiltonian H : (M")2'=+i ^ 
K given by 

k-l 

(2.3) H{x,di,. . . ,dk-i,Ki, . . . , Kk-i,X,u) := ^ {Kj,dj) + (A.u) + C(|u|) - C(a;). 



The control u is chosen at each time t so as to maximize the Hamiltonian. Since (2.3) depends on u 
and A only through the term {X,u) + C{\u\), the optimal value of u satisfies |A| u = \u\ A, and 

(2.4) H{x, di,..., dfe_i, Ki, . . . , A, u) = max H{x, di, . . . , dfe_i, ki, . . . , Kfc-i, A, u) 
if and only if 

(2.5) \u\ |A| + C(|u|) = x(A) := max a |A| + C(a). 

0<(T<1 

If C(0) > C(s) for all s G (0, 1], then A = implies w = 0; if C achieves its maximum at any point other 
than the origin, u is not uniquely determined when A = 0. (In most of the cases considered here, u is 
uniquely determined at A = but this is not true for the time minimization problem.) 
Hamilton's equations 

x^§M- = d, «i = -|f = yCix) 

^ dnf+i = ^3 = ^af~T = ^'^j-ii j = 2, . . . , fc - 1 

A ~ 9H _ „i ; dH _ 



for the Hamiltonian (2.4) are equivalent to x'^-* = and A'--'-' — Kk-j, ?' = 1, ... ,A: — 1, a;''''' 

and a'*'-' = {—l)^^^VC{x). Inserting these expressions into the Hamiltonian (2.3 1 yields 



fc-i 



(2.6) X(A) - C{x) + ^(-1)^' (xC^-^), A(^)) 



Pontryagin's Maximum Principle implies that ( 2.6 ) is constant along a curve {x, A, u) satisfying Hamilton's 
equations and maximizing the Hamiltonian. 

Definition 2.1. x e e'=-i([0, t/], R™), with piecewise continuous k-th derivative, satisfying the given 
boundary conditions on x'^-'^(O) and x^^\tf), j = 0, . . . ,k — 1, is a solution of the k-th order synthesis 
problem if there exists A : [0, tf] M™ satisfying 

(2.7) AC^) = (-l)'^-iVC(x) and jx^^) | | A| + Cdx^'^' |) = x(A) 

for <t <tf. If x''^'^ is discontinuous at then x^^\t^,) agrees with the left hand limit, i.e. x^*'-'(t*) — 
\lm.^^^- x^^>{t). 
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//, in addition, (2.6) equals zero along the curve {x,X), x is a solution of the k-th order synthesis 
problem of arbitrary duration. 

We follow the convention of [17 in specifying that x'^^^ is continuous to the left at a discontinuity; 
one could as well choose the right hand limit. 

We introduce a class of functions C for which the optimal value of the control u is explicitly given 
as the gradient of a function of A when A 7^ 0, determining a system of fc-th order system of ODEs 
for X and A. The functions equal zero on the sphere S""^^ bounding the admissible control region; the 
instantaneous control cost equals the position-dependent term at peak control values. 

Definition 2.2. If C & e°([0, 1], [0, 1]) is difjerentiable on (0,1), C(l) = 0, and there is a unique 
non- decreasing function a G C''(K"'", (0, 1]), differentiable on (t^^(0, 1), satisfying 

(2.8) cr(s) s + C{(7{s)) = x(s) := max as + C{a), 

0<CT<1 

we say that C is a moderation incentive, with moderation potential x '■ ~^ [O5 cxd) given by xi"^) '■— 

mi)- 

Lemma 1 . If C is a moderation incentive, then x is continuously differentiable and strictly increasing 
on , with = cr. 

Proof. If cr(s) < 1, then cr(s) is a critical point of cr crs — C!{a), and hence s = C'{a-{s)). In 
addition, x is differentiable at s, with derivative 

x'{s) = a{s) + {s~C'{a{s))a'{s)=a{s). 

If o-~^(l) 7^ 0, then cr~^(l) — [s*,oo) for some s*, since a is non-decreasing. Since x(s) = s — C(l) 
for s > s*, X is clearly differentiable and satisfies x'i^) = 1 = o'(s) for s > s^. Continuity of a and the 
Mean Value Theorem imply that x is differentiable at s*, with x'(s*) = 1 = ct(s*). □ 

We focus our attention on a one-parameter family of moderation incentives, which includes the trivial 
incentive Co = 0, and a quadratic polynomial moderation incentive that differs from a kinetic energy 
term by a constant in the case of controlled velocity. 

Proposition 1. The functions C^(s) ~ /iVl ~ s"^, < < 1, are moderation incentives, with 
moderation potentials 

(2.9) Xm(A) - ^/^J^+W■ 

Xfj. & G^{M.™ ,M.'^) if ^ > 0; xo continuously differentiable everywhere except at the origin, where the 
equation determining the optimal control value is completely degenerate. 

The quadratic polynomial Cq(s) = 5 (l ^ s^) is 0, moderation incentive function, with moderation 
potential 

(2.10) Xq(A) = 



i(l + |Ap) |A|<1 
|A| |A|>1 



Xq e ei(M'",E+). 

Proof. If /Li > 0, differentiating 

a s + Cfj_{a) — a s + [i\J 1 — cr^ 
with respect to tr yields the criticality condition s ~ y^a j \/\ — cr^, with unique solution 
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The inequality 



C,, 



= \/ + > max{/i, s} = max |Cp(0), s + C^(l)| 



for > and s > implies that (2.11) is the optimal value of a, and hence Xp, is given by (2.9). 
a s + Co{s) = a s achieves its maximum s = y/(fi~+~s^ at the boundary point cr = 1. 
We now consider the quadratic polynomial Cqi 

achieves its maximum on [0, 1] at cr = min{s, 1}. □ 

The evolution equations for x and A in the synthesis problem associated to a moderation incentive 
are a pair of k-ih. order skewed gradient equations. 

Proposition 2. Let C e C^(M", [1, oo)) and C be a moderation incentive , with moderation potential 
X- X : [0,t/] — ?> M" satisfying the given boundary conditions is a solution of the synthesis problem for the 
cost function C{x) — C'duj) if and only if there exists A : [0, tf] — > M" satisfying 



(2.12) 



Vx(A) 



and 



//, in addition, the conserved quantity \2.(y^ equals zero, {x, A) is a solution of the arbitrary duration 
synthesis problem. 

If X{t^) — at some time and Vx(0) is undefined, the first equation in {2.12) is replaced with 



Van. 



vx{x{t)). 



Proof. Assume that x is a solution of the synthesis problem, with auxilliary function A. If A 7^ 0, 
uniqueness of the maximizer cr and Lemma [T] imply that 



.(^) = ^^A = ^A^Vx(A). 



|A| 



|A| 



On the other hand, if A exists such that (a;, A) satisfies (2.12 ), then x^'^^ = Vx(A), then the same argument 
shows that (2.7) is satisfied. 

Since x is 6^ on M+, and hence x is 6^ on M"\ {0}, the left handed limit lim^.^^- Vx(A(i)) is well- 
defined when A(i*) = 0, and equals \im^^^- x^''\t). □ 

If Vx is invertible, then the first equation in (2.12) can be solved for A. For example, the elliptic 
ive a poten 

Vxp(A) = 



moderation incentives have a potential with invertible gradient: 

A 



with 



(Vxm)-'H = 



/i u 



1 



In this case, (2.12) is equivalent to a 2fc-order ODE in x. 

Corollary 1. Let C e C^(M", [1, 00)) and C be a moderation incentive with moderation potential x- 
// Vx is defined everywhere and invertible, then x : [0, tf \ — > M" satisfying the given boundary conditions 
is a solution of the .synthesis problem for the cost function C{x) — C{\u\) if and only if 



(2.13) 



(Vx)-^ (xW) = {-lf-'^C{x) 
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for <t < tf. If, in addition, 

(2.14) dx) = X ((Vx)-^ {x^'^)) +E(-1)^ Ij^^^)"' ' 

j=i \ / 

X is a solution of the arbitrary duration synthesis problem. 

3. One dimensional controlled acceleration systems. In the arbitrary duration problem the 
final time tf is generally not known a priori; if a closed form expression for the solutions of the synthesis 
problem cannot be found, an iterative numerical procedure may be needed be find the appropriate tf . In 
some situations it may be both possible and desirable to reformulate the problem to avoid this difficulty. 
As an example, we present an approach suitable for a class of one dimensional controlled acceleration 
problems. 

If A is known a priori to be nonzero, we can reparametrize the evolution equations and use the 
conservation of (2.6) to replace the pair of autonomous second order of equations (2.121 with a pair of 
first order nonautonomous ODEs, with independent variable A, and a subordinate first order equation 
relating A and t. The induced boundary conditions for this problem may be more convenient than those 
of the original synthesis problem. As we shall show below, the pair of first order ODEs for x and the 
auxiliary variable q can be formulated without a priori knowledge of the behavior of A; if a solution (x, q) 
is found that satisfies the relevant equalities and inequalities, it determines a solution of the synthesis 
problem. 

Proposition 3. // 

(i) there are functions r : [Xq, Xf] — > M and q : [Xq, A/] — > satisfying the evolution equations 

(3.1) 5 r' + C o r — X = constant, and q' — —2C'{r), 

and the boundary conditions r{Xo) — xq, r{Xf) — Xf , \J q{Xo) r'(Ao) = t;o sgn(A/ — Aq), and 
y^CV) r'i>^f) = Vf sgn(A/ - Aq) 
(a) the solution of the IVP 

(3.2) A = s5n(A/ - Ao)Vg(A) and A(0) = Aq 

passes through Xf at some positive time tf, 
then X = r o A : [0, t/] — > M" is a solution of the one dimensional controlled acceleration synthesis problem 



with boundary data x{0) ~ Xq, x(tf) — Xf , i(0) = Vq, and x(tf) — Vf . If the constant in [3.1) is zero, 
then X is a solution of the arbitrary duration synthesis problem. 
Proof. Differentiation of A = sgn (A/ — Ao)yg yields 

A = sgn(A,-Ao)^ = X4 = -CV) 



Differentiation of the first equation in (3.2) yields 

= (g/ + C o r - x)' = gr" + + (C o r)' - x' = gr" - (C o r)' - x'- 

Hence 

|^(r o A) = r"{Xf + r'X = qr" -{Co r)' = x' ■ 



Thus (xoA, A) satisfy the evolution equations (2.12). The boundary conditions in (i) guarantee that roA 
satisfies the boundary conditions of the synthesis problem. 
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X 




Fig. 3.1. Solutions of the arbitrary duration synthesis problem for the time minimization problem, with trivial moder- 
ation incentive and duration tf = 2, and for the quadratic moderation incentive C'q, with tf = \/6. Left: acceleration x(t); 
right: position x{t). 



If the constant in the first equation in (3.2 1 is zero, then 

X = qr' + C or = }?r' + C or ^ \r + C or, 



and hence (2.6) is zero. □ 



In the arbitrary duration case, the velocity initial condition is satisfied if and only if 



(3.3) 



x(Ao) = C(a;o) t'o = 

= «o and x'(Ao) -^0(^0 - A/) > wo 7^ 



entirely analogous conditions hold for the terminal velocity. For example, if the initial and terminal 
velocities are both zero and x is one-to-one, then |Ao| = x~^(C(a;o)) and |A/| = x^^{C{^f))- 



Remark: If it is known a priori that a solu tion x of the synthesis problem must satisfy i ^ for all we 
can reduce the fourth order system (2.13) to a third order one by solving (2.14) for obtaining 



and hence 



dt 



VT 



C(x), 



1-P C{X) 



3.1. Warm-up example: constant cost controlled acceleration. As a simple illustrative ex- 
ample, we consider the one dimensional controlled acceleration problem x = u, with boundary conditions 



(3.4) 



x{0)^0, x{tf) = l, 



and 



i(0) = x{tf) = 



for some final time tf > 0, and constant cost function C = 1. The trivial incentive version of this problem 
appears as the introductory example in [17j . and appears in several other control texts. The 'bang-bang' 
solution 



(3.5) 



x(t)=i+i(|l-t|(l-t)-l) 



2t- 1 



< i < 1 
1< i < 2 
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with peak control effort tliroughout the maneuver, is optimal; see, e.g. [6]. Given a moderation ince ntive 
C : [0,1] [0,1], we seek a solution of the synthesis problem with boundary conditions (3.4) and 
unmoderated cost C = 1. Our treatment is a straightforward application of Proposition [3] 
Proposition 4. If C is a moderation incentive, then for any Aq > 



(3.6) q = 2 (x(Ao) - x(s))ds. 



^(A) = 



9 J\ 



determine a solution 



x{t) := r Ao 1 - 2 



(x(Ao)-x(|s|))ds 



Q<t<tt 



tf = 



2Ao 



of the synthesis problem with boundary conditions (3.4-) and cost function 1 — C If x{Xo) ~ 1; then x is 
also a solution of the arbitrary duration synthesis problem. 

Proof. The constant q and function r given by (3.6) clearly satisfy the evolution equations q' = Q = 
~2C\x) and 

qr' + Cor-x^ x(|A|) - x(Ao) + 1 - x(A) - 1 - x(Ao) 

and boundary conditions '"(Aq) = 0, r(— Ao) = 1, and r'(±Ao) = 0. The auxiliary function \[t) :~ 
Ao (1 — 2t/tf) satisfies A(0) = Ao, A(</) = — Ao and A = — = sgn(A/ — Aq)-^. Hence Proposition [s] 
implies that x is a solution of the synthesis problem. □ 

We now apply Proposition |4] to the trivial, quadratic, and elliptical incentives. For the time mini- 
mization problem, with trivial incentive Ctm = 0, we have Xtm{s) = s, and hence 



rtm(A) = A/A+i(A/_|A|A) 



and 



2Af 



V'"tm(A/) 



= 2. 



Thus we obtain the well-known 'bang-bang' solution (3.5) for any positive A^. The total cost is simply 
the duration, 2. Note that the condition x(Ay) = 1 is not necessary — the criticality (with respect to 
duration) condition leading to x(A/) = 1 need not be satisfied at the end point of the range of possible 
durations. _ 

The quadratic incentive, Cq(s) = ^ (l — s^) has the piecewise smooth scalar incentive potential 



Xq(s) = 



'1 



It follows that if < Ao < 1, then (|3.6|) takes the form 

r-q(A) 



A| 
6 



< s < 1 

s > 1 

and ! 



6 



and hence 
(3.7) 



Xq(t;tf) 



3-2 



tf > Ve. 



The arbitrary duration synthesis problem solution is given by Aq = 1; note that tf = y/6 is the minimum 
duration for the smooth Cq solutions. (If Aq > 1, the corresponding solutions of the synthesis problem 
are piecewise smooth; we do not construct these solutions here.) 
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Fig. 3.2. Elliptical moderation incentive (EMI) solutions, plotted with respect to the rescaled time s := -j^fj^ '■ 

fi = 10"'^, |, |, 1 — lO^'^. Upper left: differences between EMI solutions x^(sty(/x)) and the time minimizing solution 
xo{s); upper right: differences between EMI solutions x^{stf(p,)) and the rescaled time quadratic incentive solution Xq(s); 
lower left: velocities Xf_i{stf{fi)); lower right: accelerations: x ij.{s tf [p.)) . 



The total cost for the trajectory (3.7) is 

cost(t/)= r {l-C^{x{t;tf)))dt^\ r {l + i{t-tff)dt^\ + ^ 
Jo Jo ^ 7 

if tf > \/6 (and hence Ao < 1). Note that if we replace the moderated cost function 1 — C'q(u) = ^ (l + w^) 

2 

with the kinetic energy-style cost function the evolution equation is unchanged, but the total cost 
^ is now a strictly decreasing function of the maneuver duration tj; hence none of the solutions of this 

synthesis problem are solutions of the arbitrary duration problem. The convention that the moderation 
incentive equal zero on the boundary of the admissible control region determines the constant that 'selects' 
the fastest smooth solution of the evolution equation as the solution of the arbitrary duration problem. 

We now turn to the elliptic moderation incentives C^(s) = /xVl — s^, < ft < 1, with X/i(s) = 
y^fi^ + s^. (Note that Cq = Ctm.) If we let 

g^iX) := -i (Axp(A) + /i^ In (x^(A) + A)) 

denote an anti-derivative of Xfi , then 



r,.W = Xm(A/)(A + A/) + 5/.(-A/) ~ <?^(A) 
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= \ (xm(A/)(A + \f) + (Xm(A/) - Xm(A))A - /i^ In 



Xm(A/) - A/ 



The expression for simplifies somewhat for the solution to the arbitrary duration problem, for which 
Xai(A/) = 1 and hence 



(3.8) r^(A) = ^A + + (1 - Xm(A))A + ^^ In 
In this case, 

(3.9) r^(A/) = Vl-Ai'+/i'ln 



Xp(A) + A 



We plot some information about the solutions for some sample values of in Figure 3.2 To 
facilitate comparison, we plot and its derivatives using the scaled time s := j-jS". Figure 3.3| shows 



the total cost and duration for the solutions of the arbitrary time synthesis problem, plotted as functions 
of the moderation parameter /i. 

As the moderation parameter goes to zero, the EMI control problem approaches the time minimization 

one: 

lim (5^,(7^,2;^) = (Ctm, CTtm, a;tm)- 

ft— >0 

In the limit /i — > 1, the solutions of the arbitrary duration EMI problem approach those of equal 
duration of the quadratic incentive synthesis problem. Specifically, 

lim ."^^^^j =1 for Q<t< tfU), 



while Xf = -^/l — /i^ and (3.9) imply that 

^ ^ 3 

lim tfUfJl - u? = lim — = 6. 

(Recall that \/6 is the duration of the solution of the arbitrary duration quadratic incentive problem.) 
As suggested by these limits and Figure 3.2 the family of EMI solutions can regarded as linking those of 
the trivial and quadratic incentive problems. 



Remark: For the EMI synthesis problem with < < 1, the pair of second order ODEs (2.121 is 
equivalent to the fourth order ODE 

(3.10) ^|^^=^ = -VC(x); 



see Corolla ry [T| In the present example, with constant C, the (nonzero) moderation parameter /i plays 
no role in ( |3.10 1, and hence has no influence on the solution of the specified duration problem. The 
parameter does, however, effect the solution of the arbitrary duration problem. The analog of the 
condition x(|Ao|) — C{xq) for an EMI controlled acceleration problem with zero initial velocity is the 
initial acceleration condition 



= C(.To), i.e. |i(0)| = v/C(^^F^. 

VI - x{QY 
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Fig. 3.3. EMI solutions: duration, tf{ii), and total cost, cost(pt). 



The solution to the specified duration problem with cost function C^, < < 1, and duration T > 2 



IS X, 



Thus the expressions (3.8 



specified and arbitrary duration prob 
in the (x, A) formulation. 



and (3.9) are sufficient to determine the solutions for both the 



ems; as m the time minimization case, there is some redundancy 



3.2. A one-dimensional controlled acceleration example: spooking. We consider a gener- 
alization of the controlled acceleration example from Section |3.1[ adding a position penalty to the cost 
function; specifically, C : M — ?> [l,oo), with C(l) = 1 (the target is a 'no-cost' position) and C{x) > 1 
for X e [0, 1). This can be regarded as a very simple model of spooking — the reaction of, e.g., a grazing 
herbivore to unexpected abrupt noise or motion. When startled, the animal will initially rush away from 
the disturbance; when it has reached its 'comfort zone', it will either turn to examine the threat or resume 
grazing. In our simple one-dimensional model, the function C{x) acts as a 'fear factor', modeling the 
undesirability of remaining near the perceived threat and providing an incentive to move rapidly to the 
comfort point a; = 1. 

We consider the position-dependent cost function 



The boundary conditions x(0) = x{0) 
arbitrary duration problem satisfies 



c{x)^i+-ii-xr. 

= x{tf) = and x{tf) — 1 imply that a solution (a:;,A) of the 



(3.11) 



x(Ao) = C(0) 



and 



X(A/) 



In addition, A must change sign at least once, since the boundary conditions for x imply that sgn x = sgn A 
must change at least once. We assume that i(0) is positive, and hence Aq — x"^ + f )■ 

We first analyse the arbitrary duration synthesis problem for the moderation incentives C^. Lacking 
closed form solutions for either the second order system 



(3.12) 



A2 



and 



A = c(l - x). 



derived from (2.12) for C{x) = 1 + |(1 — x)"^ and or the fourth order ODE 



(3.13) 



= c(l - x) 
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c= 1 



c = 5 





/i = .D 

_1Q Bcceieration^ — fi = 25 




Fig. 3.4. Position and accleration plots for the solutions of the EMI synthesis problem for c = p, 1, and 5, and 



and 1. 



derived from (2.13), we turn to the reparametrized form of the system (3.1 ), which can be implemented 



using standard numerical BVP routines. (Our numerical approximations were computed using the built-in 
Mathematica function NDSolve.) The boundary conditions on A take the form 



(3.14) 



Ao = Ao(c,^) 



and 



A solution of the reparametrized problem determines a solution of the synthesis problem such that the 
auxiliary function A has nonzero first derivative; A changes sign at most once (and hence exactly once) 
in this situation. Hence we take Ay = — ^1 — /i^. The reparametrized problem takes the form of seeking 
a solution (r, q) : [Aq, A/] M x M+ of the BVP 



qr' + l + -{l-rf = ^fi^ + y 



and 



2c(l-r), 



with boundary conditions r(Ao) — and r{Xf) = 1 for Aq = Ao(c, ^) and A/ = — i/l — /x^. This BVP can 
be solved numerically (using, e.g., a shooting method). Once a solution has been found, the elapsed time 
can be found as a function of A by numerically computing the integral 



t{X) 



ds 



Ao ^^(s) 

shows some sample plots, with c — ^, 1, and 



3.4 



The desired solution x is given hy x = r o t . Figure 
5, and ^ = |, and 1. 

Remark: We shall see that, as in the previous example, the total cost function for the QCC problem is 
a monotonically decreasing function of the maneuver duration, with countably many inflection points, 
corresponding to trajectories that oscillate about the target before coming to rest. We do not rule out 
the possibility that such oscillations may also lead to a decrease in cost for the elliptical moderation 
incentives; our numerical searches are directed only towards the identification of trajectories that do not 
overshoot the target. 

Note that Figure [3^ shows little response to the position penalty strength c for small values of /i — the 



solutions remain close to the corresponding solutions for c — 0. In Figure 3.5 we plot x^(t; c) — i^(t; 0) for 
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/Li = |, J, and |, and c — and 1. On the other hand, the solutions for /i near 1 are strongly influenced 
by the position penalty; the initial acceleration increases dramatically with c and the acceleration curve 
changes from concave to convex as c increases. A sufficiently frightening event will provoke an initially 
strong response even when the moderation incentive is high, but the later stages of the recovery from 
that initial response are largely determined by the moderation incentive. This roughly agrees with 
actual spooking behavior: when startled, an inexperienced animal will often respond by first bolting, 
then abruptly halting and whirling about to examine the apparent threat while still in a state of high 
excitement; a more experienced one can still be spooked, but quickly regains its composure in the absence 
of real danger and gradually decelerates, reducing the significant skeletomuscular stresses of a hard stop 
and risk of self-inflicted injury during rapid motion. 



fi = .5 





Fig. 3.5. Xi_,{t; c) - x^{t; 0) for fi ■ 



\, \, and \. 



left: c = i , right: c - 



The quadratic polynomial moderation incentive Cq yields a piecewise smooth solution of the arbitrary 
time synthesis problem that is simpler in some regards, but less convenient in others, than the solutions for 
the elliptical incentives. As we shall see, the controls for the Cq solutions satisfying (2.12) are continuous. 



but not everywhere differentiable if the parameter c in the position-dependent component of the cost 
function is nonzero; the solution behaves like the time-minimization solution, with control identically 
equal to one, for the first part of the manuever, then satisfies a linear fourth order ODE for the remainder 
of the manuever. 



Equation (2.101 implies that the restriction of Xa^ to [1,cxd) is the identity map; hence the initial 



condition Xq(Ao) — C(0) = 1 + f imphes that |Ao| = 1 + f and |i(0)| = 1. A similar argument shows 
that the terminal condition Xqi^f) = C'(l) = 1 implies that |A/| = — 1- Since x{0) — x{tf) implies 

that X, and hence A changes sign, A must pass through zero; hence there exists > such that A(t) > 1 
for < i < and X{t) < 1 on some interval (t*,i* + e). We find solutions to the arbitrary duration 
problem satisfying 1 > A > — 1 on {t^,tf), with X{tf) = —1. (As we shall show later, there are solutions 
to the specified duration synt hesis p roblem that overshoot and oscillate about the target.) 

The second order system (2.121 equals that for Cq when |A| > 1: x{t) satisfies i = 1 on the interval 
[0,t*]; the initial conditions a;(0) = i(0) = imply that x{t) = ^ on [0,i*]. On (i*,t/]. 



X — Xq(A) = A, and hence x^"^^ = X ~ c{l — x). 
The linear fourth order PDE y"" -I- 4 y = has the general solution 



(3.15) 



vis) 



cosh s 
sinhs 









coss \ 






sin s 1 
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0.5 1.0 




Fig. 3.6. Position, velocity, and acceleration plots for the solutions of the synthesis problem with moderation incentive 
Cq, c = |, 5. 



for an arbitrary matrix M e M^^^; (3.15) satisfies the initial conditions y(0) = yo, y'{Q) = vq, and 
y"{0) = ao if and only if 



(3.16) M 

for some rri G M. 

We seek i*, tj, and M such that 



Vq ~ m 



Vq +m 

2 

Oq 

2 



(3.17) 



0<t<U 
<t<tf 

is twice differentiable and satisfies the boundary conditions x{tf) = 1, x{tf) — 0, and x{tf) = —1. Hence 
we require 

2/0 = 1-^, VQ = -t^, and uq ^ - [cr^/^V2j x{t^)^-—^. 



2 



If we set s/ :— c^'^/v2 (t/ — t*), then the terminal conditions are 

(3.18) y{sf)^y'{sf)^Q and y" {sj) ^ ~ (c-^'^V2)\{tf) 



Solving (3.181 for yo, vo, and to, given qq = yields 



yo \ / yo(s/) 

Vc I Wo = VQ{sf) 

ml \ m{sf) 



The matching condition 



(cosh Sf + cos Sf ) (sinh — sin Sf ) 

, — (sinh Sf — sin Sf)^ 

cosh Sf sin Sf + cos Sf sinh \ ^^^^^ ^ ^ 



4v^ 2 2 ^ ' 
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2468 10 02468 10 



Fig. 3.7. Transition time t, and total duration tj for the quadratic moderation incentive Cq. 



equivalently 



implicitly determines a function sj : M+ — > (0, s/), where tans/ + tanhsy — 0. The functions 



give the desired transition time and duration. The curves i*(c) and i/(c) are shown in Figure 3.7 plots 
of the position, velocity, and acceleration for some representative values of c are shown in Figure |3"!6l 

We now briefly consider the alternative cost function Cqcc{x, u) — ^ + |(1 — x)^. This cost function, 
in which the term ^^ is naturally interpreted as a positive control cost added to the position-dependent 

cost, differs from the Cq moderated problem analysed above only by the constant ^. Thus solutions 
of the synthesis problem for one cost function are solutions for the other, as well. A solution of the 
synthesis problem for the moderation incentive Cq is a solution for the arbitrary duration synthesis 
problem for Cqcc if and only if the Hamiltonian determined by Cqcc is equal to zero along the solution; 
equivalently: if and only if a:(0)^ = c and x(tf) = 0. If we restrict our attention to c < 1, then this 
condition on the initial acceleration is compatible with the general control constraint \x\ < 1. We seek 
M e M^^^ and s/ G such that y{s) given by (3.15) satisfies the boundary conditions y{0) = 1, 



y'(0) = y{sf) = y'{sf) = y"{sf) = 0, and y"(0) = -2, and hence 

x{t) = l-y{c}'^/V2t 



satisfies the boundary conditions of the arbitrary duration synthesis problem for Cqcc, where 



c V^Sf, as before. After substituting yo = 1, vq = 0, and oq = —2 into (3.16), we find that the termi- 
nal conditions y{sf) = y'{sf) — y"{sf) — are satisfied if and only if to = 2 cothsj and sins/ = 0. Thus 
the solutions of the arbitrary duration synthesis problem for Cqcc have duration tf{c, k) := cr^l^ >/2 tt fc, 
fc G N. If fc > 1, the solution of duration tf{c,k) overshoots the destination, making oscillations 
about the target before stopping. ^ 

We now show that none of these solutions of the arbitrary duration problem for Cqcc actually minimize 
the total cost; 

^ /■*/ ^ cm ~ 

Ctot{tf) ■= / Cqcc{x,x)dt ^ —, with derivative C^^^{tf) = -2 cx{tff . 
Jo ^ 
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Fig. 3.8. Solutions of the arbitrary duration synthesis problem for Cqcc, for c = .25, .5, .75, 1 and k = 1. 
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Fig. 3.9. Parametric plots {x,x) of QCC solutions for c= I, tf = k\/2TT. Left: k = 1 and 2; right: close-up of k = 2 



Thus the cost is nonincreasing and the solutions of duration tf{k) are all inflection points, satisfying 
Ctot(i/(c, fc)) = 2c coth/sTT. Hence increases in the total maneuver time yield exponentially small reduc- 
tions in cost. 

For relatively small ^values of c, Figures |3.4| and |3.8| show a qualitative resemblance between the 

moderated solutions for Ci and the solutions for Cqcc- To facilitate the comparison of these solutions, we 

— . . . 1/4 

reparametrize the Ci solutions using the rescaled time s :— t. The rescaling of the fourth evolution 

equation (3.131 takes the form 



4(1 -x). 



Hence the moderated solution approaches that of the QCC problem as /i — > 1 and c — > 0, but the two 
families are not equal for nonzero c. See Figure [3. 10| for a comparison of the controls for the two families 
for some representative values of c. 

4. Conclusions and future work. Our choices of state space, vector fields, admissible control re- 
gions, and unmoderated cost functions were intended to be the simplest possible. We intend to generalize 
each of these components of the synthesis problem. 

Optimal control on nonlinear manifolds has received significant attention in recent years, particularly 
situations in which the controls can be modeled as elements of a distribution within the tangent bundle 
of the state manifold, corresponding to (partially) controlled velocities. See, e.g. [121 HSl UHl [3] , and refer- 
ences therein. Analogous constructions for (partially) controlled higher order derivatives (e.g. controlled 
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acceleration) can be implemented using jet bundles, but can be unwieldy in practical implementations. 
We are particularly interested in Lie groups, since these manifolds possess additional structure that fa- 
cilitates the identification of the controls with elements of a single vector space, the Lie algebra. Results 
for conservative systems on Lie groups, homogeneous manifolds, and associated bundles should be easily 
extended to optimal control problems. Geometric integration schemes for the numerical integration of 
Hamiltonian systems can be used to approximate the solutions of synthesis problems on such manifolds; 
see [m H EOl El [To] and references therein. 



The conservation law ( 2.6 ) can play a crucial role in the exact or approximate solution of the synthesis 
problem. In [8j we develop several results, including a reduction of the evolution equation for the auxiliary 
variables to the sphere, that exploit the conservation law. We show that a simple vertical take-off 
interception model with controlled velocities can be reduced to quadratures using this approach. We 
intend to generalize these results and apply them to more complex systems in future work. 

The assumption that the admissible control set is the unit ball can be relaxed to {u G V : f{u) < c} 
for some vector space V, diffcrcntiable function f : V ^ [/min,oo) with V/ everywhere nonzero on 
911, and constant c g M. The analog of Cq would be C{u) = c — f{u); the analogs of would be 
fi \/ c — f{u). Clearly the property that the control u would be a rescaling of the auxiliary variable A 
would not hold; hence the determination of the value of u maximizing the Hamiltonian would, in general, 
be more complicated than in the case considered here. More generally, the incentive could be a function 
of both control and state variables, retaining the property that the function rewards avoidance of the 
boundary of the admissible control set. 



We intend to investigate the skewed gradient equations (2.12) in greater detail, seeking both analytic 
properties and efficient numerical schemes. We hope to model various biomechanical systems using 
optimal control formulations of the type described here, investigating the utility of moderation incentives 
in the interpretation of animal motion and behavior. The construction of families of optimal solutions 
parametrized by moderation or urgency may shed light on aspects of motion planning that are not easily 
understood using a single cost function. For example, most of us utilize a range of strategies when lifting 
and carrying objects: a newborn, a laptop, and a phone book merit different levels of caution. Significant 
expenditure of energy is required to capture prey or evade a predator, but exhaustion and lameness leave 
both predator and prey vulnerable to future attacks and starvation — long-term survival depends on the 
adjustment of resource consumption to the demands of each encounter. Even very simple mathematical 
models can add to our understanding of the adaptability of natural control systems. 
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